System &amp; Method for Real-Time Video Communications

ABSTRACT

Systems and methods for video communication services are presented herein. In particular, systems and methods in which multiple participants can simultaneously create and share video in real-time are presented herein. Other systems and methods are also presented herein.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation to U.S. patent application Ser. No.13/407,732, filed Feb. 28, 2012, entitled “System and Method forReal-Time Video Communications, which claims the benefit of U.S.Provisional Application No. 61/447,664, filed on Feb. 28, 2011, andentitled “System and Method for Real-Time Video Communications”, whichis incorporated herein by reference.

FIELD OF INVENTION

The present invention relates generally to video communication services.More particularly, the present invention relates to electronic devices,computer program products, and methods with which multiple participantscan simultaneously create and share video in real-time.

BACKGROUND OF THE INVENTION

Demand for Real-Time Video

Explosive growth in consumer and business demand for real-time video onmobile and Internet devices has created exciting new commercialopportunities and major new technical challenges. As they pursue theintegration of new real-time video capabilities (FIG. 1) formobile/Internet communication, business collaboration, entertainment,and social networking, device manufacturers, network infrastructureproviders, and service provides are struggling to meet customerexpectations for higher quality real-time video across a wider range ofdevices and networks.

Limitations of Broadcast Video Solutions

Today's standard video processing and distribution technologies havebeen developed to efficiently support one-way video broadcast, not thetwo-way and multi-party video sharing required for real-time mobile andInternet user interaction. Traditional broadcast industry solutions haveproven to be too computationally complex and bandwidth hungry to deliverthe device, infrastructure, or bandwidth requirements for commerciallyscalable real-time mobile/Internet video services.

Device, Network, and Video Fluctuations

Furthermore, the available computational resources on many devices, aswell as the delay, jitter, packet loss, and bandwidth congestion overuser networks cannot be guaranteed to remain constant during a real-timevideo/audio communication session. In the absence of any adaptationstrategy, both device and network loading can lead to significantdegradation in the user experience. An adaptation strategy designed toaddress network fluctuations but not device loading fluctuations isineffective, since it is often difficult to distinguish between thesetwo contributors to apparent “lost packets” and other performancedegradations. Adaptation to frame-to-frame fluctuations in inherentvideo characteristics can provide additional performance benefits.

Embodiments of the present invention comprise an all-software Real-timeVideo Service Platform (RVSP). The RVSP is an end-to-end system solutionthat enables high-quality real-time two-way and multi-party videocommunications within the real-world constraints of mobile networks andpublic Internet connections. The RVSP includes both Client and Serversoftware applications, both of which leverage low-complexity,low-bandwidth, and network-adaptive video processing and communicationsmethods.

The RVSP Client (FIG. 2) integrates all: video and audio encode, decode,and synchronization functions; real-time device and network adaptation;and network signaling, transport, and control protocols, into a singleall-software application compatible with smartphone and PC operatingsystems. The RVSP client application has been designed to accommodatefluctuations in: the internal loading of client devices; externalimpairments on a variety of different user networks; and inherent videocharacteristics such as frame-to-frame compressibility and degree ofmotion.

The RVSP Server (FIG. 3) integrates multiparty connectivity,transcoding, and automated video editing into a single all-softwareapplication. The all-software architecture of the RVSP supports flexibledeployment across a wide range of network infrastructure, includingexisting mobile application/media server infrastructure, standardutility server hardware, or in a cloud computing infrastructure. Forboth peer-to-peer and server-based real-time 2-way video share servicesand multi-party video conferencing, the RVSP platform reduces both theup-front capital expenditures (CapEx) and on-going operationalexpenditures (OpEx) compared to existing video platforms that utilizesignificantly higher bandwidths and require additional specialized videohardware in both the user devices and the network infrastructure.

In order to meet customer expectations for higher quality video across awider range of devices and networks, mobile operators and othercommunication service providers worldwide have made significant newinvestments in IP Multimedia Subsystem (IMS) network infrastructure. Byreducing bandwidth consumption and supporting higher concurrent userloading capabilities for a given infrastructure investment and bandwidthallotment in an IMS deployment (FIG. 4), the RVSP provides significantCapEx and OpEx reductions over competing real-time video platforms thatrequire additional specialized video hardware in both the user devicesand the network infrastructure.

The RVSP also delivers similar CapEx and OpEx benefits for “over thetop” (OTT) and direct-to-subscriber deployments of real-time videoservices (FIG. 5) using standard utility server hardware or in a cloudcomputing infrastructure. In these cases, mobile devices communicatingvia public Internet or corporate networking infrastructure typically donot have access to video quality-of-service (QoS) enhancements in themobile operator's IMS core. The real-time network adaptation features ofthe RVSP disclosed here then become critical to delivering a compellinguser experience within the real-world constraints of mobile networks andconsumer Internet connections.

Video conferencing systems are evolving to enable a more life-like“Telepresence” user experience, in which the quality of the real-timevideo and audio communications and the physical layout of the meetingrooms are enhanced so that multiple remote parties can experience thelook, sound, and feel of all meeting around at the same table. As shownin FIG. 6, multi-user video conferencing systems typically requirespecially designed meeting rooms with dedicated video cameras, largesize video displays, arrays of audio microphones and speakers, andspecialized processing equipment for digitizing, compressing, anddistributing the multiple video and audio streams over dedicatedhigh-speed data network connections.

For many consumer and business applications, there is a need to extendhigher quality multi-party video communications to participants using awider variety of less-specialized video-enabled electronic devices,including mobile communications devices, laptop computers, PCs, andstandard TVs. There is also a need to extend immersive businesscommunications to support a wider range of consumer and professionalcollaboration and social networking activities.

When it comes to multi-party video communications, users of theseless-specialized electronic devices encounter a number of drawbacks inthe devices and in the user experience. For example, these devices mayhave a wide range of video processing capabilities, video display sizes,available connection bandwidths, and available connectionquality-of-service (QoS). Furthermore, without the benefit of speciallydesigned meeting rooms, creating a “perceptually pleasant” meetingexperience is challenging. Many video conferencing systems rely on astatic screen layout in which all participants are presented within anarray of equal-sized video “tiles”, even though several participants maybe passive listeners throughout much of the meeting and hence contributevery little. These “static” multi-party video default display layoutshave many drawbacks, including:

-   -   1. All participants are displayed at the same image size, same        image quality, and same video frame rate, regardless of their        level of participation.    -   2. Individual participants have no control over the display        layout on their own device.    -   3. A participant with the role of “moderator” cannot “give the        floor” to individual participants, as they can in a face-to-face        conference setting.    -   4. Participants cannot choose to focus on one other participant,        as they can in a face-to-face conference setting.

When deployed together, the RVSP Client and Server applications enablemultiple participants to simultaneously create and share high-qualityvideo with each other in real-time, with many key aspects of aface-to-face user experience.

BRIEF DESCRIPTION OF THE DRAWINGS

To facilitate further description of the embodiments, the followingdrawings are provided, in which like numbers in different figuresindicate the same elements.

FIG. 1 illustrates examples of real-time video services.

FIG. 2 illustrates an example of real-time video service platform clientaccording to an embodiment of the present invention.

FIG. 3 illustrates an example of a real-time video service platformserver application according to an embodiment of the present invention.

FIG. 4 illustrates an example of a system according to an embodiment ofthe present invention.

FIG. 5 illustrates an example of a system according to an embodiment ofthe present invention.

FIG. 6 illustrates an example of a multi-party communication system.

FIG. 7 illustrates examples of network impairments.

FIG. 8 illustrates an example of measured variations in compressed videoframe size generated for a constant level of perceived image quality.

FIG. 9 illustrates examples of video quality and user experiencedegradations.

FIG. 10 illustrates an example of a system according to an embodiment ofthe present invention.

FIG. 11 illustrates an example of differences in network congestion.

FIG. 12 illustrates an example of a system according to an embodiment ofthe present invention.

FIG. 13 illustrates an example of a rate function.

FIG. 14 illustrates an example of a video encoder according to anembodiment of the present invention.

FIG. 15 illustrates an example of a network configuration.

FIG. 16 illustrates an example of a network configuration.

FIG. 17 illustrates an example of measured output video bit rate vs.target bit rate.

FIG. 18 illustrates an example of measured output video bit rate vs.target bit rate.

FIG. 19 illustrates an example of a system with voice activity detectionaccording to an embodiment of the present invention.

FIG. 20 illustrates an example of a system with moderator selectionaccording to an embodiment of the present invention.

FIG. 21 illustrates an example of a system with participant selectionaccording to an embodiment of the present invention.

DETAILED DESCRIPTION

RVSP Client Application

As illustrated in FIG. 2, the RVSP Client integrates all: video andaudio encode, decode, and synchronization functions; real-time deviceand network adaptation; and network signaling, transport, and controlprotocols, into a single all-software application compatible withleading smartphone and PC operating systems. The highly modular and openAPI architecture of the RVSP Client supports rapid and flexible deviceand service customization. Key components of the RVSP Client applicationinclude the Digital Technology Media Engine (DTME), Application Layer,and Device Abstraction, OS Abstraction, and Network Abstraction modules.The RVSP Client can include more or less components than specificallymentioned herein.

Application Layer

The Application Layer provides the primary user interface (UI), and canbe rapidly customized to support a wide range of real-time videoapplications and services with customer-specified User Experience Design(UxD) requirements. The Application Layer is implemented in Java toleverage the many additional capabilities included in today's mobiledevice and PC platforms. An example Application Layer for a mobile VideoChat service would include the following modules:

SIP, NAT Ensures compatibility with real-time communications (Sessioninfrastructure deployed by mobile operators and Internet video Control)service providers. Implements SIP-based call session Moduleprovisioning, device registration, device and service capabilitiesexchange, call session management, and media routing. The RVSP Clienthas been successfully integrated with multiple SIP servers and otherproprietary signaling protocol servers. Call View Implements the UserInterface (UI) for each application, Activities allowing forcustomer-specific branding at both the device and Module service level.Settings Governs the user editable settings for each application orModule service. Settings are preserved in the device database and thuspersistent Address Interacts with both the native handset address bookand any Book additional Network Address Book and Presence functions.Module

DTME

The DTME implements all media (video and audio) processing and deliveryfunctions. The DTME collects media streams from their designatedsources, encodes or decodes them, and delivers the encoded/decoded mediastreams to their designated destinations. Each media source may be ahardware device (camera, microphone), a network socket, or a file.Similarly, each media destination may be a hardware device (display,speaker), a network socket, or a file.

RTP/ Enables efficient network operations, and interfaces RTCP directlywith device input/output devices (camera, Stack and display, microphoneand speaker) via a hardware abstraction layer. The RTP/RTCP stack alsoincludes an Adjustable Jitter Buffer, which automatically sets thejitter buffer depth depending on network conditions determined by theRTA module. Real-Time In order to provide an industry-leading real-timemobile Adaptation video user experience, the RVSP Client application(RTA) includes a Real-Time Adaptation (RTA) Module Module designed toaccommodate fluctuations in the internal loading of a variety ofdifferent client devices, external impairments on a variety of differentuser networks, and inherent video characteristics such as frame-to-framecompressibility and degree of motion. In the absence of real-timeadaptation, device and network loading significantly degrade userexperience in real-time mobile/Internet video services. DTV-X The DTV-XVideo Codec at the heart of the DTME Video dramatically reduces thecomputational complexity of Codec high-quality, real-time video captureand playback, enabling all-software implementations on mobile handsets.The DTV-X codec dramatically reduces compressed image data size whileretaining high picture quality, extends device and networked videostorage capacity, realizes higher image-per-second livemonitoring/playback, enables faster download speeds, and supportsadvanced video manipulation in the device and/or in the network. OtherSince the video codec functions are fully abstracted in Video the DTME,the RVSP Client can be configured to utilize Codecs any other videocodecs, such as H.263 and H.264, which are already integrated intohandset or PC hardware. This feature enables support for the widestpossible range of devices and legacy video service infrastructure. AudioIn a similar manner, the audio codec functions are also Codecs fullyabstracted in the DTME, so that the RVSP can be configured to utilize awide range of embedded audio codecs and acoustic echo cancelationsolutions.

The DTME communicates with the Application layer thru a well-definedApplication Layer Interface (DTME API) for rapid and flexiblecustomization across a wide range of real-time video applications. TheDTME API also enables a “headless” client, allowing third parties suchas handset OEMs and video service providers to develop their own customapplications.

Device Abstraction, OS Abstraction, and Network Abstraction Modules

These modules allow installation and interoperability of the RVSP Clienton devices running all of today's leading smartphone and PC operatingsystems. They also allow the RVSP Client to accommodate the wide rangeof cameras, displays, and audio hardware found in smartphones and PCs,and allow real-time video services to leverage the widest possible rangeof 3G, 4G, WiFi, DSL, and broadband network connectivity modes.

RVSP Server Application

As shown in FIG. 3, the RVSP Server integrates multiparty connectivity,transcoding, and automated video editing into a single all-softwareapplication that can be deployed both on existing mobile operator serverinfrastructure or standard utility servers in a cloud computinginfrastructure.

Many real-time video services require support for additional networkbased video processing, including

-   -   multiparty connectivity    -   transcoding    -   automated video editing    -   multimedia mashups    -   connectivity to legacy video conferencing systems.

The RVSP Server provides these functions in an industrial-strengthsolution built using standards-based software—without the use or addedexpense of a hardware-based MCU or MRF appliance. An all-software RVSPServer solution enables customers to purchase only the number of portsthey need, then grow into additional ports as the user base expands. TheRVSP Server solution's flexible capacity management also enables videoservice providers to roll-out services spanning mobile video chat thrubroadband HD videoconferencina on a sinale platform.

RVSP Server Specifications System Requirements SIP compatible,multipoint video Operating System: Linux, Windows and voiceconferencing, trans- Server 2003 R2/2008 coding, and automated mediaProcessor: Dual Core processor or mixing and editing higher required foroperation On-demand, personal meeting 2.5 GHz Xeon processor or higherrooms or one-click, ad-hoc required for HD video support conferencesConcurrent user capacity varies Personal layout selection with based onavailable processor speed continuous presence, automatic and number ofavailable cores layout adaptation based on number Resource usage variesby selected of conference participants resolution Large conferencesupport up to the Memory: 4 GB capacity of the MCU Diskspace: 2 GB Up to720p30 transmit and receive Network: Single, 100 Mbps networkresolutions, call rates up to 4 Mbps adapter with full duplexconnectivity Selectable 4:3 and 16:9 aspect ratio and a static IPaddress for transmitted video Virtual Servers: Supported; DTV-4, H.264,H.263+, and dedicated resources required H.263++ video codecs AMR,AAC-LC, G.711, G,722, G.722.1c, MP3 audio codecs SIP Registration andproxy support Web-based, remote configuration and management Multi-leveladministrative access control using Windows domain and local hostauthentication authorities Usage and system logging to Microsoft SQLServer 2008 Configurable DiffServ settings for audio and video EndpointAPI via SIP CSTA for advanced conference management REST API formanagement integration

Additional RVSP Server benefits include:

-   -   Natural Interactions—High quality media experience across a wide        range of devices and networks    -   Standards Based—Supports existing conferencing standards and        interfaces to legacy conferencing systems    -   Right-sized Buying—Flexible deployment model empowers customers        to license only the ports they need    -   Scalability—Easily add host server processing power to increase        RVSP Server capacity    -   Flexible Capacity Management—Ensures optimal resource usage    -   Transcoding/Transrating—For each port, ensures that endpoints        receive the best possible experience based on their        capabilities.

Real-Time Adaptation Sub-System

A Real-Time Adaptation (RTA) sub-system has been integrated into theRVSP client application to enable prediction and/or measurement of, andadaptation to, fluctuations in the following device/network impairmentsand video characteristics:

Device Impairments

Existing real-time video client applications running on commerciallyavailable smart phones, tablets, and other video-enabled devices sufferfrom many device impairments, including:

-   -   Differences between front camera versus rear camera. Some        devices have front cameras limited to 15 fps and VGA        (640.times.480 pixels) image sizes, while rear cameras on the        same devices can support up to 30 fps and larger image sizes.    -   Limited control of camera frame rate. Some camera modules, once        activated in video mode, deliver a constant frame rate (i.e. 30        fps) regardless of what frame rate is requested by the calling        application    -   Poor tracking of camera frame rate. Some camera modules, once        activated in video mode, do not accurately track and maintain        the requested frame rate. Deviations between requested and        delivered video frame rates may also be influenced by processor        loading due to other applications running on the device.    -   CPU loading during camera operation. Some camera modules, once        activated in video mode, automatically activate additional video        processing functions in the device that can lead to significant        processor loading. This loading in turn can limit overall        real-time video applications to lower frame rates than targeted.

Resulting real-time video application degradations resulting fromfailure to adapt to device impairments include:

-   -   discrepancies between uncompressed video frame rates requested        by the real-time video client application and the actual frame        rates delivered by device camera modules    -   uncompressed video frames that are delivered to the real-time        video client by the device camera module, but cannot be passed        to the video encoder due to timing limitations    -   compressed video frames that arrive in the real-time video        client, but cannot be passed to the video decoder due to timing        limitations.

Network Impairments

Existing real-time video services running on commercial wireless (3G,4G, WiFi) and wireline (DSL, broadband) suffer from many networkimpairments (FIG. 7), including:

-   -   Packet delay & jitter in the network.    -   Outright packet loss in the network.    -   Other “network congestion”. Traffic contention due to the        presence of other data traffic can manifest itself as a decrease        in available network bit rate and/or decrease in data stream        signal-to-noise ratio (SNR). Traffic contention may also        manifest itself as increased packet delay/jitter and packet loss        in the network.    -   Asymmetry between uplink & downlink characteristics for each        party on a real-time video call session.

Resulting real-time video application degradations resulting fromfailure to adapt to network impairments include:

-   -   media packets/audio & video frames that arrive in the receiver's        client device but are sufficiently delayed/out of order that the        client application is forced to ignore them and not pass them to        the decoder    -   media packets/audio & video frames, and control/signaling        information that never arrive in the receiver's client device    -   wide variations in the quality of individual participants' video        streams on a multi-party video conference.

Variations in Inherent Video Characteristics

Testing on commercially available smart phones, tablets, and othervideo-enabled devices has revealed that, depending on typicalframe-to-frame variations in inherent video data characteristics such asthe relative degree of luma and chroma detail and frame-to-frame motion,the bits/frame required to maintain a constant level of user-perceivedimage quality can vary significantly (FIG. 8).

Real-time video application degradations that can result from failure toadapt to variations in such characteristics as frame-to-framecompressibility and degree of motion include:

-   -   the real-time video client attempting to drive target bits/frame        or frames/second to unnecessarily high or unattainably low        levels during a video call session.

The many real-time video quality and user experience degradations thatresult from failure to adapt to device and network impairments, andfluctuations in inherent video characteristics, include: stalling,dropped frames, dropped macro-blocks, image blockiness, and imageblurriness (FIG. 9).

RTA Sub-System Design Strategy

Successful real-time adaptation to the above impairments andfluctuations requires that the RTA sub-system in the RVSP Clientapplication simultaneously analyze fluctuations of multiple video,device and network parameters, via measurement and/or prediction, inorder to continuously update and implement an overall real-time videoadaptation strategy. FIG. 10 illustrates the RTA Subsystem inputs andcontrol outputs.

Device Impairments: The RTA sub-system analyzes the behavior ofuncompressed and compressed audio and video frames being generated,transmitted, stored, and processed within the device to detect and adaptto fluctuations in device loading. The RTA sub-system adapts to themeasured fluctuations in device loading via corresponding

-   -   i) internal modifications to the target compressed frame rate to        be generated and sent from the device to another RVSP-enabled        user.    -   (ii) requested modifications to the target compressed frame rate        to be generated and sent to the device from another RVSP-enabled        user.

Network Impairments: The RTA sub-system analyzes the audio and video RTPmedia packets and corresponding RTCP control packets being generatedwithin, and transmitted between, RVSP-enabled client devices, in orderto measure/predict and adapt to fluctuations in the network. The RTAsub-system adapts to the measured fluctuations in network performancevia corresponding modifications to

-   -   (i) Targeted uncompressed video frame rate (fps) to be delivered        by the camera to the DTV-X video encoder.    -   (ii) Targeted compressed video bits/frame to be delivered by the        DTV-X video encoder. Several encoding parameters determine the        compressed video bits/frame, including:    -   Quantization parameter Q    -   progressive refresh parameters    -   saliency parameters    -   PN frame ratios    -   I frame insertion    -   (iii) Targeted video data packet size to be generated by the        RVSP client application's RTP/RTCP module for network        transmission to another user.    -   (iv) Video frame/stream format requested from other user:        -   Send/resend I frame    -   (v) Frame buffers in RVSP media framework    -   (vi) Packet buffers in RVSP RTP stack    -   (vii) RTCP messages in RVSP RTP stack

Inherent Video Characteristics: The DTV-X video encoder analyzesframe-to-frame variations in the inherent compressibility ofuncompressed video frame sequences being delivered from the cameramodule, and communicates this information to the RTA sub-system. The RTAsub-system utilizes this information to prevent the RVSP client fromattempting to drive target bits/frame or frames/second to unnecessarilyhigh or unattainably low levels during a call session. The inherentcompressibility will vary with the relative degree of luma and chromadetail and/or the relative degree of motion in a sequence of videoframes.

Successful real-time adaptation within the RVSP Client applicationrequires that the above analysis and feedback be implemented as a set ofcollaborating processes within and between the RTA sub-system, the DTV-Xvideo codec, the RTP/RTCP module, and the Session Control module. Duringa real-time video session, the RTA sub-system first determines deviceand network limitations during call setup and capabilities exchangebetween the participating devices. Once a real-time video has beenestablished, the RTA sub-system continues to analyze and adapt tofluctuations in device and network impairments and video parameters.

Determining Device/Network Limitations during Call Setup: During callsetup and capabilities exchange, the RVSP client application determinesthe media bandwidth appropriate to the targeted user experience that canbe supported by the device(s) and network(s) participating in the videocall session. For each video call session, this bits/second target isthen utilized by the RVSP client application(s) to establish

-   -   the initial video frame resolution and frame rate targets (for        camera interfacing)    -   the initial bits/frame target (for DTV-X codec interfacing)    -   the initial bytes/packet target (for RTP/RTCP module        interfacing).

The initial bits/second, frames/second, bits/frame, and bytes/packettargets should likely not be chosen to correspond to the maximum ratesthat are expected to be supported by the device(s) and network(s)participating in the video call session. Instead, the initial targetsshould be chosen so as to guarantee a high probability that they willactually be met, in order to avoid prolonged periods at call startupwhere the RTA sub-system is “out of target” and delivering a degradeduser experience.

Analyzing Device/Network Impairments and Video Parameters during Call:

The following RTA-related parameters are measured during each videocall:

Device Impairments

i. Camera Speed Degradation

Measured input is the difference between the uncompressed video framerate requested from the camera and the actual uncompressed video framerate that is delivered to the RVSP application for processing by theDTV-X video encoder.

Used to determine the maximum video frame rate that can be requested.

ii. Device Loading on Send Channel

Measured input is the fraction of the uncompressed video framesdelivered by the camera that arrive within a time window suitable to befurther processed by the DTV-X encoder.

Used to determine the maximum video frame rate that can actually beencoded and sent.

iii. Device Loading on Receive Channel

Measured input is the fraction of the compressed video packetssuccessfully received and re-assembled into complete video frames by theRVSP application within a time window suitable to be further processedby the DTV-X decoder.

Used to determine the maximum video frame rate that can actually bedecoded and displayed.

Network Impairments

iv. Network Congestion

Measured inputs are the RTCP reports and internal video and audio packetoutput status utilized by each device to determine the number of packetsthat the device itself has already sent but are still in transit to thetarget receiving device.

Used to estimate available network bandwidth in order to update targetbit rate in bits/sec and packet size in bytes/packet. Our ownmeasurements have revealed that correlation between the transmission ofaudio and video packets on mobile network is poor (FIG. 11). Packettracing added to the DTME to report the number of Audio and Videopackets in transit at any given time under multiple controlled anduncontrolled network conditions has shown that the fractional in-transitAudio and Video packet counts are not well correlated, and that bothshow a significant dependence on packet size for any given level ofnetwork congestion. At higher levels of network congestion, smallervideo packet sizes result in improved overall video throughput. At lowerlevels of network congestion, efficient video throughput can bemaintained with larger packet sizes.

Used to estimate available network bandwidth in order to update targetbit rate in bits/sec and packet size in bytes/packet. Our ownmeasurements have revealed that correlation between the transmission ofaudio and video packets on mobile network is poor (FIG. 11). Packettracing added to the DTME to report the number of Audio and Videopackets in transit at any given time under multiple controlled anduncontrolled network conditions has shown that the fractional in-transitAudio and Video packet counts are not well correlated, and that bothshow a significant dependence on packet size for any given level ofnetwork congestion. At higher levels of network congestion, smallervideo packet sizes result in improved overall video throughput. At lowerlevels of network congestion, efficient video throughput can bemaintained with larger packet sizes.

Poor correlation between Audio and Video packets in transit can be usedas an indication that network congestion is high and that the Videopacket size is not small enough to ensure efficient video throughput atthe current level of network congestion

v. Uplink and Downlink Network Packet Loss

Measured inputs are the RTCP reports indicating the fraction of packetslost.

Used as an additional input to gauge the network congestion and thecorresponding effective real-time network bandwidth. Also used to modify“aggressiveness” of progressive refresh and PN frame ratio.

vi. Uplink and Downlink Network Jitter

Measured input is the difference between arrival-time intervals (betweensuccessive packets, observed as they arrive on the receiver device) andcapture-time intervals (between successive packets, as indicated bytimestamps written by the sender device). These difference measurementsare processed using a rolling average filter to calculate a “recentlyobserved jitter”.

Used to adapt the depth of the RVSP jitter buffer in order to supportpacket re-ordering.

vii. Roundtrip, Uplink, and Downlink Network Delays

Measured via NTP-based time values provided in RTCP sender report—RFC1889 section 6.3.1 (also see FIG. 2: Example for round-trip timecomputation). Let SSRC_r denote the receiver issuing this report. SourceSSRC_n can compute the round-trip propagation delay to SSRC_r byrecording the time A when this reception report block is received. Itcalculates the total round-trip time A-LSR using the last SR timestamp(LSR in the RTCP Sender report) field, and then subtracting the delaysince the last SR was sent (DLSR in the RTCP Sender report field). Theround-trip propagation delay is then given as (A-LSR-DLSR).

May be used to estimate signaling delay that must be accounted forif/when an I-Frame resend request is made from one device to another.

Inherent Video Characteristics

viii. Video Frame Compressibility

Measured via the internal compression parameters (quantization levels,prediction mode decisions, and others) and the resulting actualcompressed frame size generated by the DTV-X encoder. The inherentcompressibility will vary with the relative degree of luma detail,chroma detail, brightness, contrast, and/or the relative degree ofmotion in a sequence of video frame.

Used to alter the trade-off between bits/frame and frames/second targetsduring extended sequences of “highly compressible” frames. When framesare highly compressible, it may be advantageous to allocate fewer bitsto each compressed image and choose a higher frame rate for smoothermotion, within the given overall bit rate target. Conversely, whenframes are difficult to compress well, it may be advantageous to reducethe frame rate and allocate more bits to each compressed frame.

ix. Relative Degree of Luma and Chroma Detail

Measured from the quantization levels and resulting compressed size indifferent wavelet transform subbands for current video frame input toDTV-X encoder.

Used to determine minimum bits/frame required to provide good imagefidelity.

Used to determine minimum bits/frame required to provide good imagefidelity.

Measured from motion channel in saliency map for current video frameinput to DTV-X encoder.

Used to determine the minimum frame rate required to provide good motionfidelity, and to support lower frame rates and higher bits/frame targetsduring extended sequences of “low motion” frames. The motion channel ofthe saliency map generation compares a filtered version of the currentframe's luma against the same version of the previous frame or frames,estimating motion based on the magnitude of differences in thecomparison.

Adaptation to Device/Network/Video Fluctuations during Call: The RVSPRTA sub-system processes the above inputs from the camera module, DTV-Xcodec, RTP/RTCP module, and RVSP packet/frame buffers in order to updateits estimates of parameters (i)-(x) above.

Adaptation to Device/Network/Video Fluctuations during Call: The RVSPRTA sub-system processes the above inputs from the camera module, DTV-Xcodec, RTP/RTCP module, and RVSP packet/frame buffers in order to updateits estimates of parameters (i)-(x) above.

Adapting the Jitter Buffer Depth: Based on the updated estimate of (vi),the RTA sub-system then either maintains or updates the RVSPpacket/frame jitter buffer depth(s). If excessive jitter bursts aredetected, and these cannot be accommodated by packet re-ordering in thejitter buffer set to its maximum depth, then the corresponding packetsmust be treated by the RVSP client as if they were lost. The RTAsub-system may send a request to the other user to send a new I-frame inorder to reset the decode process at the receiver impacted by the burst.The roundtrip network delay estimate (vii) provides the device with alower limit on how long it must expect to wait for the requested I-frameto be delivered, and thus how long it must rely on alternativemechanisms (saliency/progressive refresh/V-frames) to deal with the highpacket loss.

Adapting to Video Frame Compressibility and Degree of Motion: Based onthe updated estimates of (viii)-(x) above, the RTA sub-system theneither maintains or updates the bits/frame and frames/sec targets. Inorder to deliver the best user experience using the least device andnetwork resources, the RTA sub-system can maintain lower bits/frametargets during extended sequences of “highly compressible” frames (lowrelative degree of Luma and Chroma detail), and/or lower frames/sectargets during extended sequences of “low motion” frames.

RTA Sub-System Implementation

Key Modules

As shown in FIGS. 10 and 12, the RTA subsystem includes the followingmodules:

-   -   Automatic Bit Rate Adjustment (ABA)    -   Rate Function    -   Frame Rate Regulator    -   Compression Regulator    -   Jitter Buffer Control    -   Packet Size Control    -   Codec Control

Automatic Bit Rate Adjustment (ABA): The ABA evaluates two measurementsof the network performance to determine the target bit rate for videotransmission:

-   -   (i) Packet Loss Analysis—The receiver maintains a count of the        received packets and a count of the gaps in the packet sequence        numbering—which are lost packets. Periodically, the receiver        sends a report to the sender with the ratio of lost packets.    -   (ii) Network Buffer Fill Level Analysis—The receiver        periodically sends a port to the sender with the sequence number        of the last received packet. The sender compares this number to        the last sent packet sequence number to approximate the number        of packets remaining on the network en route to the receiver.

The ABA compares this bit rate target against the peer's computation toensure this unit does not consume a disproportionate amount of theavailable network bandwidth. The ABA unit periodically notifies it'speer with its determined target bit rate for video transmission. Thepeer compares its own target to this value and when its value issignificantly larger, the peer lowers its own target correspondingly.

Rate Function: The Rate Function converts the target bit rate into acorresponding combination of frame rate and bytes/frame for the encoderoutput. As shown in Frame 13, the rate function incorporates thefollowing parameters:

-   -   Minimum bit rate (bits/sec)    -   Maximum bit rate (bits/sec)    -   Minimum frame rate (fps)    -   Maximum frame rate (fps)    -   Minimum compression level (bytes/frame)    -   Maximum compression level (bytes/frame)

Frame Rate Regulator: Because the output frame rate from the cameramodules on many smartphones and tablets is often irregular, the FrameRate Regulator provides intermediate frame buffering/processing in orderto ensure that the DTV-X video encoder receives video frames at the fpsrate targeted by the Rate Function.

Compression Regulator: The Compression Regulator monitors the encoderoutput and modulates the frames/sec and bytes/frame targets based on therecent frame compressibility history provided by the video encoder. Thegoal is to deliver the best user experience using the least device andnetwork resources. For example, the RTA sub-system can maintain lowerbits/frame and higher frames/second targets during extended sequences of“highly compressible” frames (low relative degree of Luma and Chromadetail), and/or lower frames/sec and higher bits/frame targets duringextended sequences of “low motion” frames. Additionally, the CompressionRegulator monitors and compares the actual uncompressed video frame ratedelivered by the Camera and the actual compressed video frame ratedelivered by the Encoder, and adjusts the bytes/frame target to achievethe target bit rate. The Compression Regulator can thus modify the RateFunction described above.

Jitter Buffer Control: The Jitter Buffer Control measures the differencebetween arrival-time intervals (between successive packets, observed asthey arrive on the receiver device) and capture-time intervals (betweensuccessive packets, as indicated by timestamps written by the senderdevice). These difference measurements are processed using a rollingaverage filter to calculate a “recently observed jitter”. If therecently observed jitter increases, the temporal depth of the RVSPjitter buffer in the RTP/RTCP module is increased in order to supportpacket re-ordering over a larger number of packets. If the recentlyobserved jitter decreases, the temporal depth of the RVSP jitter bufferin the RTP/RTCP module is decreased correspondingly.

Packet Size Control: The maximum transmission unit (MTU) is the largestpacket size that can be transmitted over a network. Occasionally, thesize of the video frame exceeds this maximum and the frame is splitacross several packets. The number of packets is first determined andthen the frame is split evenly across that number of packets. Packetsize can also be reduced/increased to enable more efficient videotransmission as network impairments increase/decrease.

Codec Control: The DTV-X video codec encoder accepts video frames(images) in sequence and produces compressed representations of them fortransmission to the DTV-X video codec decoder. It has various controlinputs and information outputs, in addition to the input and output ofvideo frames, as can be seen in FIG. 14.

With each frame to be compressed, the encoder accepts a frame typerequest that can be “I-Frame”, “P-Frame”, or (in some embodiments)“V-Frame”. These designate options in the encoding process and theformat of the resulting compressed frame. The encoder will produce anI-frame when requested. It may produce an I frame when a P-frame wasrequested, if the compression process produces a better result as anI-frame; for example, in the case of a sudden scene change or cut.V-frames are reduced representations that should be used only in betweenI-frames or P-frames; in some embodiments, the encoder may produce aV-frame when a P-frame was requested.

With each frame to be compressed, the encoder accepts a target for thecompressed size of the frame. This target may not be met exactly; if thevideo is changing in compressibility character, the actual compressedsize may differ from the target by a significant factor.

The encoder accepts an input indication of the desired strength ofProgressive Refresh, which is the fraction of each P-frame that shouldbe compressed without reference to prior frames or encoding state. Thereason for this is that if a frame is lost or cannot be decoded for anyreason, the decoder will not have available the necessary statereference for correctly decoding frames that follow the missing frame.Progressive refresh allows the reference state to be refreshed partiallyin every P-frame, so that it is not necessary to send I-framesperiodically. This makes the frame size and transmission bit rate moreuniform, and adds robustness against lost packets.

With each compressed frame that it produces as output, the encoderdelivers an indication of the frame type actually used for this frame,whether I-Frame, P-Frame, or V-Frame.

With each compressed frame that it produces as output, the encoderdelivers an indication of the actual size to which the frame wascompressed. This may be compared with the size target that was given asinput, and used in a rate control tracking loop to keep actual ratewithin tighter long-term bounds than the codec's single frame sizetargeting ability.

With each compressed frame that it produces as output, the encoderdelivers an estimate of the compressibility of the frame. This can beused in deciding how to balance frames-per-second against bits-per-framein the rate control process.

With each compressed frame that it produces as output, the encoderdelivers an estimate of the motion activity of the frame, and of thedetail levels in the frame. These can be used in deciding how to balanceframes-per-second against bits-per-frame in the rate control process.

RTA Bit Rate Adjustment Algorithm Description

RTA Bit Rate Adjustment Algorithm Description

DEFINITIONS

Packet Loss rate is the fraction of the total transmitted packets thatdo not arrive at the intended receiver.

Network Buffer Fill Level is the number bytes (or in the case of uniformpacket sizes—the number of packets) currently in transmission throughthe network.

Packet Loss Analysis: The packet loss ratio is taken directly from the‘fraction lost’ field in the RTCP Sender or Receiver Report packet (SR:Sender report RTCP packet—Paragraph 6.3.1 if RFC 1889; RR: Receiverreport RTCP packet—Paragraph 6.3.2 if RFC 1889). This value is averagefiltered:

LR _(new)=α_(L) *LR _(old)+(1−α_(L))*LR _(net)  (1)

-   -   where:        -   LR_(new) is the newly filtered Packet Loss ratio value;        -   LR_(old) is the previous Packet Loss ratio value;        -   LR_(net) is the Packet Loss ratio value from the RTCP            receiver report;        -   α_(L) is a parameter specifying how aggressive the algorithm            reacts to the latest reported value, and 0≦α_(L)≦1.

Network Buffer Fill Level Analysis:

The sender keeps track of the latest transmitted packet sequence number.The receiver reports the latest received packet sequence number in itsRR report. The sender subtracts its number from the receiver's number tocalculate the amount of data currently in transmission through thenetwork. Since the report's value is inherently offset by the networkdelay between receiver and sender, the difference defines an upperestimate of the network buffer fill level. This value is averagefiltered:

N _(new)=α_(N) *N _(old)+(1−α_(N))*N _(net)  (2)

-   -   where:        -   N_(new) is the newly filtered Network Buffer fill level            value;        -   N_(old) is the previous Network Buffer fill level value;        -   N_(net) is the freshly calculated Network Buffer fill level            value;        -   α_(N) is a parameter specifying how aggressive the algorithm            reacts to the new calculated value, and 0≦α_(N)≦1.

Because the maximum Network Buffer fill level is not known, the latestnetwork value is compared to the previous averaged value and thisdifference becomes the final result.

N _(fill)=(N _(net) −N _(old))/N _(new)  (3)

Packet Loss Adjustment Strategy:

Network packet loss is defined to be in one of the following threestates:

-   -   Congested: The Packet Loss ratio is high and the transmission        quality is low.    -   Fully Loaded: The Packet Loss ratio is affordable and the        transmission quality is good.    -   Under Loaded: The Packet Loss ratio is very small or zero.

The estimate of the network packet loss conditions is based on relativevalues of the filtered values of the Packet Loss ratio, LR_(new), andtwo threshold values LRc (congested Packet Loss ratio) and LR_(u)(under-loaded Packet Loss ratio):

if (LR _(new) ≧LR _(c))→network congestion

if (LR _(c) >LR _(new) ≧LR _(u))→network fully loaded

if (LR _(u) >LR _(new))→network under loaded  (4)

According to one exemplary embodiment, the above parameters can be:

α_(L)=0.5, LR_(c)=0.05, LR_(u)=0.02.

Network Buffer Fill Level Adjustment Strategy:

Network congestion is defined to be in one of the following four states:

-   -   Congested: The network fill level is high and the transmission        quality is low.    -   Fully Loaded: The network fill level is affordable and the        transmission usage is good.    -   Under Loaded: The network fill level is underutilized and        transmission should increase slightly.    -   Very Under Loaded: The network fill level is underutilized and        the transmission should increase significantly.

if (N _(fill) ≧N _(c))→network congestion

if (N _(c) >N _(fill) ≧N _(u))→network fully loaded

if (N _(u) >N _(fill) ≧N _(vu))→network under loaded

if (N _(vu) >N _(fill))→network very under loaded  (5)

In one exemplary embodiment, the above parameters are set to:

-   -   N_(c)=0.75; N_(u)=0.45; N_(vu)=0.15

Combined Adjustment:

The combined algorithm includes both of the above two algorithms.Because “Network Buffer Fill Level” provides a more sensitive predictionof network congestion than “Packet Loss Ratio”, the RTA uses “NetworkBuffer Fill Level” as a primary adjustment, and “Packet Loss RatioAdjustment” as a secondary adjustment, according to the followingspecific conditions.

if (N _(fill) ≧N _(c))→use higher of two adjustments

if (N _(c) <N _(fill) ≧N _(u))→use Network Buffer Fill Leveladjustment  (6)

RTA Testing

Network Test Configuration

Network configurations used for RTA performance testing and evaluationare shown in FIGS. 15 and 16. A standard off-the-shelf Linux basedcomputer is used to implement the Network Impairment Router. Networkimpairment is realized using Traffic Control, a standard utility onLinux. Traffic Control is a command line based utility. MasterShaper isa network traffic shaper that leverage Traffic Control and other Linuxutilities to provide a Web Interface for Quality of Service (QoS)functions.

Two devices are used to conduct real-time Peer-to-Peer Video Call tests.Each device connects to a separate access point, forcing the video callpath to go thru the Network Impairment Router. Shell scripts leveragingTraffic Control/MasterShaper commands are used to control thePeer-to-Peer Video Call path. MasterShaper allows predetermined valuesand fluctuations of the bandwidth, delay, jitter, and packet loss to beset for the Video Call path.

IPerl is installed on both clients and used to validate the IFPWbandwidth. One iPerf client is setup as the server and the other as theclient. iPerf performs a test to measure the effective bandwidth overthe network connection between the client and the server.

Bandwidth Adaptation Testing

Bandwidth adaptation test cases demonstrate the RVSP Client's capabilityto maintain high quality video under varying network bandwidthavailability. The network bandwidth is first set to a target (constant)level, and a real-time video call session is initiated to demonstratehow the RTA sub-system allows the Client to adapt to the initial networkcapacity (FIG. 17). Next, the network bandwidth is varied during thevideo call session to demonstrate how the RTA sub-system allows theClient to track and adapt to variations in the network capacity (FIG.18).

Jitter Adaptation Testing

Network jitter presents a particular challenge for video. In this testcase we induce a jitter of up to 50 ms, and show how the Clientcontinues to deliver high quality video. Qualitative results areproduced by the Client, and can be observed using eclipse's loggingfacility (by connecting the device under test to an eclipse enabled PC).The Client reports the number of total audio and video packets that werefound out of order, and the degree to which it was successful sortingthe out-of-order packets. For qualitative results, the video can beobserved during the call while jitter is introduced. Video is neverfrozen. Further, the Client's saliency capability is used to onlyrefresh salient parts of the video when packets are lost. Additionally,by turning jitter on and off, the relative delay on Handset B isadjusted automatically. Hence the Client does not rely on a fixed bufferthat introduce needless delay.

Packet Loss Adaptation Testing

All networks are prone to packet loss. This is a particular problem forwireless networks, or use cases where the packets must traverse multiplenetwork boundaries to reach the target destination. In this test case,we implement packet loss rates up to 5% on the video communication path,and observe the resulting video quality. Since reducing the bandwidthcan also cause the packet loss rate to vary, the Client bandwidthadaptation capability (ABA) is turned off for these tests. We turn offadaptation by selecting the menu button during a video call, andclicking on “ABA Off” button. The result of this test is qualitativeonly. Similar to the jitter test case, the video never freezes, and inevent of a packet loss, only salient parts of the video are refreshed,resulting in a more acceptable user acceptable experience.

Video Conferencing User Features

When deployed together, the RVSP Client and Server applications enablemultiple participants to simultaneously create and share high-qualityvideo with each other in real-time, with many key aspects of aface-to-face user experience.

FIG. 19 is an overview diagram of an all-software multi-user videoconferencing system according to one embodiment of the presentinvention, with user-enabled voice activity detection.

FIG. 20 is an overview diagram of an all-software multi-user videoconferencing system according to one embodiment of the presentinvention, with the moderator able to select a participant to be given“the floor” via display at maximum video frame size/frame rate on allparticipants' device displays.

FIG. 21 is an overview diagram of an all-software multi-user videoconferencing system according to one embodiment of the presentinvention, with each participant able to select which other participantwill be displayed at maximum video frame size/frame rate.

While several embodiments have been shown and described herein, itshould be understood that changes and modifications can be made to theinvention without departing from the invention in its broader aspects.For example, but without limitation, the present invention could beincorporated into a wide variety of electronic devices, such as featurephones, smart phones, tablets, laptops, PCs, video phones, personaltelepresence endpoints, and televisions or video displays with externalor integrated set-top-boxes (STBs). These devices may utilize a widevariety of network connectivity, such as 3G, 4G, WiFi, DSL, andbroadband.

1. A real time communication platform, comprising, an application layermodule interoperable on a processor associated with a mobile devicehaving a memory and at least one camera, wherein the application layermodule comprises at least one session control module; a digitaltechnology media engine in communication with the application layermodule and at least one media source accessible to the processor of themobile device, wherein the digital technology media engine includes atleast one codec; and a real time adaptation sub-system in communicationwith the application layer module and the processor, wherein the realtime adaptation sub-system is capable of detecting and adapting tovariations in one or more conditions to which at least one of theprocessor, memory, or the at least one camera is subjected.
 2. Thesystem of claim 1, wherein the one or more conditions that the real timeadaptation sub-system is capable of detecting relate to at least onevariety of network impairment
 3. The system of claim 2, wherein thenetwork impairment relates to packet delay.
 4. The system of claim 2,wherein the network impairment relates to network congestion.
 5. Thesystem of claim 1, wherein the one or more conditions that the real timeadaptation sub-system is capable of detecting relate to at least onedevice impairment.
 6. The system of claim 5, wherein the deviceimpairment relates to a frame rate associated with one of the at leastone cameras.
 7. The system of claim 5, wherein the device impairmentrelates to processor loading time of the processor.
 8. The system ofclaim 5, wherein the device impairment relates to limitations associatedwith a forward-facing one of the at least one cameras.
 9. The system ofclaim 1, wherein the application layer module is embedded within abrowser installed on the mobile device.
 10. The system of claim 1,wherein the application layer module is capable of communicatinginformation related to a codec to a different mobile device.
 11. Thesystem of claim 1, wherein the session control module is capable ofperforming device registration operations.
 12. A system for real timecommunication comprising, a client application installed within a mobiledevice having a memory and at least one camera, wherein the clientapplication comprises: an application layer module overlaying the mobiledevice and comprising at least one session control module; a digitaltechnology media engine in communication with the application layermodule and at least one media source accessible to the processor of themobile device, wherein the digital technology media engine includes atleast one codec; a real time adaptation sub-system in communication withthe application layer module and the processor, wherein the real timeadaptation sub-system is capable of detecting and adapting to variationsin one or more conditions to which at least one of the processor,memory, or the at least one camera is subjected; and a plurality ofserver applications installed on a server.
 13. The system of claim 12,wherein the client application is embedded within a browser installed onthe mobile device.
 14. The system of claim 12, wherein the servercomprises a video gateway.
 15. The system of claim 12, wherein theserver comprises a multi-point control unit.
 16. The system of claim 12,wherein a first server application provides a transcoding functionality.17. The system of claim 16, wherein a second server application enablesreal time video editing.
 18. The system of claim 12, wherein the serveris a cloud-based server.
 19. The system of claim 12, wherein the sessioncontrol module is capable of performing device registration operations.20. A method of providing real time communication, comprising: deployinga client application to a mobile device having a memory and at least onecamera, wherein the client application comprises: an application layermodule capable of being interoperable on a processor associated with themobile device and comprising at least one session control module; adigital technology media engine capable of communicating with theapplication layer module and at least one media source accessible to theprocessor of the mobile device, and including at least one codec; and areal time adaptation sub-system capable of communicating with theapplication layer module and the processor, and further capable ofdetecting and adapting to variations in one or more conditions to whichat least one of the processor, memory, or the at least one camera issubjected.
 21. The method of claim 20, wherein the client application isembedded within a web browser application.
 22. The method of claim 20,wherein the client application is deployed in association with a webbrowser application.
 23. The method of claim 20, wherein subsequent tobeing deployed in the mobile device, the client application communicateswith a different mobile device.
 24. The method of claim 23, wherein thecommunication is with a web browser application installed on thedifferent mobile device.
 25. The method of claim 23, wherein thecommunication includes information related to a codec.
 26. The method ofclaim 20, wherein the one or more conditions that the real timeadaptation sub-system is capable of detecting relate to at least onevariety of network impairment
 27. The method of claim 26, wherein thenetwork impairment relates to packet delay.
 28. The method of claim 26,wherein the network impairment relates to network congestion.
 29. Themethod of claim 20, wherein the one or more conditions that the realtime adaptation sub-system is capable of detecting relate to at leastone variety of device impairment.
 30. The method of claim 29, whereinthe device impairment relates to frame rate associated with one of theat least one cameras.
 31. The method of claim 29, wherein the deviceimpairment relates to processor loading time of the processor.
 32. Themethod of claim 29, wherein the device impairment relates to limitationsassociated with a forward-facing one of the at least one cameras.